Online computer system with methodologies for distributed trace aggregation and for targeted distributed tracing

ABSTRACT

An online distributed computer system with methodologies for distributed trace aggregation and targeting distributed tracing. In one aspect, the disclosed distributed tracing technologies improve on existing distributed tracing technologies by providing to application developers and site operations personnel a more holistic and comprehensive insight into the behavior of the online distributed computer system in the form of computed span metric aggregates displayed in a graphical user interface thereby making it easier for such personnel to diagnose problems in the system and to support and maintain the system. In another aspect, the disclosed distributed tracing technologies improve on existing distributed tracing technologies by facilitating targeted tracing of initiator requests.

TECHNICAL FIELD

The present invention relates generally to online distributed computersystems and, more particularly, to tracing user requests processed bysuch systems.

BACKGROUND

The first web sites were largely implemented with only two “tiers” ofcomputers. Requests from users' web browsers were mainly handled by afirst tier of web server computers. In some instances, a web servercomputer in the first tier would need to request data from a databaseserver in the second tier of computers in order to formulate and send anappropriate response to a user's request. Over time, a third tier,commonly known as the “application tier”, was added in between the webserver tier and the database tier. In both cases, tracing user requeststhrough the various server tiers was relatively simple because theoverall distributed computer environment was limited and clearlydefined.

With the ever-increasing popularity of the Internet, more and moreonline services are implemented as complex, large-scale distributedcomputer systems. Modern online services can have hundreds ofapplications executing on thousands of computing devices in multipledata center facilities. Management and execution of all of the variousapplications is typically facilitated by server “virtualization”.Virtualization allows multiple “virtual” servers (i.e., instances) toexecute the applications at one or more levels above the host computingdevices. In the last several years, virtualization has become pervasiveand is used by online service providers to more easily and rapidlyprovision new computing resources to meet user demand.

As more and more online services become facilitated by virtualization, awhole new set of challenges face providers of online services: thesepreviously small-scale, well-understood computer environments are nowN-tier distributed computer systems executing hundreds of applicationsacross thousands of instances in multiple data centers with newapplications and application upgrades constantly being added. Aparticular set of challenges involve understanding system behavior andbeing able to reason about performance issues and system failures.

Some online service providers have, in response, introduced technologiesto provide application developers with more information about thebehavior of these complex distributed systems in which there are largecollections of server computing devices, including “tracing”, logging,and similar technologies—all designed to capture information about theruntime behavior of a computer system. “Tracing” is a technology forcapturing and recording information about a software system's execution.

One tracing technology proposed for distributed computer systems isknown as “Dapper” and is described in the paper “Dapper, a large-scaledistributed systems tracing infrastructure”, Benjamin H. Sigelman, LuizAndre Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, DonaldBeaver, Saul Jaspan, Chandan Shanbhag, Google Technical Reportdapper-2010-1, April 2010, the entire contents of which is herebyincorporated by reference as if fully set forth herein. For example,Dapper technology can be used to trace randomly selected user requests.However, despite the effectiveness of tracing randomly selected userrequests, issues remain in providing developers with information aboutthe behavior of online distributed computer systems.

One particular problem that remains is how to provide a more holisticand comprehensive picture of the distributed system behavior. Forexample, a trace of a user request may be able to inform about whichapplications of the online service were invoked to handle the userrequest. However, a single trace may provide little to no insight intowhere the performance hotspots in the system are over a period of time.

Another particular problem that remains is how to selectively tracecertain user requests. For example, a user may report an error whenmaking a certain request of an online service. In this case, the onlineservice provider may wish to trace a selected subset of all subsequentuser requests of the online service in order to diagnose the root causeof the error. For example, the online service provider may wish to traceall subsequent user requests from the user that reported the error. Suchtargeted tracing is not possible with a distributed tracing technologythat traces only randomly selected user requests.

Accordingly, there is a need for distributed system tracing technologiesthat provide a more holistic and comprehensive picture of onlinedistributed system behavior and for distributed tracing technologiesthat allow targeted tracing of user requests. Such technologies increasethe effectiveness and efficiency of application developer and systemadministrator activities like maintaining and troubleshootingapplications in an online distributed computer system.

The approaches described in this section are approaches that could bepursued, but not necessarily approaches that have been previouslyconceived or pursued. Therefore, unless otherwise indicated, it shouldnot be assumed that any of the approaches described in this sectionqualify as prior art merely by virtue of their inclusion in thissection.

SUMMARY

The above deficiencies and other problems associated with distributedtracing technologies for online distributed computer systems are reducedor eliminated by the disclosed distributed tracing technologies.

A first aspect of the distributed tracing technologies of the presentinvention includes, for example, a method for distributed traceaggregation in an online distributed computer system. The methodcomprises the steps of generating trace events at a plurality of systemnodes of the online distributed computer system for a plurality ofrequest paths. Each trace event is generated for a corresponding one ofthe request paths and for a corresponding span of the correspondingrequest path. The corresponding span represents computation performed bythe system node at which the trace event is generated on behalf of aninterprocess communication call from a parent span in the correspondingrequest path. The parent span corresponds to one of the system nodes inthe corresponding request path. The method further includes the steps ofcollecting the generated trace events from the system nodes; identifyinga subset of the collected trace events pertaining to a particular systemnode; computing a span metric aggregate from span metrics in the subsetof trace events; displaying, in a graphical user interface, a graphicalrepresentation of the particular system node; and displaying, in thegraphical user interface, the span metric aggregate in conjunction withthe display of the graphical representation of the particular systemnode.

The first aspect of the disclosed distributed tracing technologiesimproves existing distributed tracing technologies by providing toapplication developers and site operations personnel a more holistic andcomprehensive insight into the behavior of the online distributedcomputer system in the form of computed span metric aggregates displayedin a graphical user interface thereby making it easier for suchpersonnel to diagnose problems in the system and to support and maintainthe system.

A second aspect of the distributed tracing technologies of the presentinvention includes, for example, a method for targeted distributedtracing in an online distributed computer system. The method comprisesthe steps of receiving a targeted trace query at an edge node of theonline distributed computer system; receiving an initiator request atthe edge node; evaluating the query against the initiator request; andenabling distributed tracing of the initiator request if the initiatorrequest satisfies the query.

The second aspect of the disclosed distributed tracing technologiesimproves existing distributed tracing technologies by facilitatingprecise targeted tracing of initiator requests.

These and other aspects of the disclosed technologies of the presentinvention are described in greater detail below with reference to thedrawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a very general block diagram of an example computing device inwhich the disclosed technologies may be embodied.

FIG. 2 is a block diagram of an example online distributed computersystem in which the disclosed technologies may be implemented.

FIG. 3 illustrates an example of the fan-out of a request path throughnodes of an online distributed computer system on behalf of a requestfrom an initiator.

FIG. 4 is a block diagram of an example of how distributed tracing in anonline distributed computer system may be facilitated through standardlibrary instrumentation.

FIG. 5 illustrates an example request path tree.

FIG. 6 is a flowchart illustrating the overall operation of distributedtrace aggregation according to an embodiment of the present invention.

FIG. 7 illustrates an example relation for storing collected traceevents.

FIG. 8 is block diagram an example data pipeline computer system forcollecting trace events.

FIG. 9 is an example call graph that may presented to a user in acomputer graphical user interface according to an embodiment of thepresent invention.

FIG. 10 is a flow diagram illustrating an example of targeteddistributed tracing.

DETAILED DESCRIPTION

In the following description, for the purposes of explanation, numerousspecific details are set forth in order to provide a thoroughunderstanding of the present invention. It will be apparent, however,that the present invention can be practiced without these specificdetails. In other instances, well-known structures and devices are shownin block diagram form in order to avoid unnecessarily obscuring thepresent invention.

1.0 Basic Computing Enviroment

The below-described basic computer environment is presented for purposesof illustrating the basic underlying computer components that may beemployed for implementing the disclosed technologies. For purposes ofdiscussion, the following description will present certain examples inwhich it will be assumed that “server” computing devices receiverequests from remote “client” computing devices. The present invention,however, is not limited to any particular computer environment orcomputer system configuration. In particular, a client/serverdistinction is not necessary to the invention, but is used to provide aframework for discussion. Instead, the disclosed technologies may beimplemented in any type of computer system architecture or computerenvironment capable of supporting the disclosed technologies presentedin detail here, including peer-to-peer configurations, or the like.

1.1 Implementing Mechanism (Hardware Overview)

The disclosed technologies may be implemented on one or more computingdevices. Such a computing device may be implemented in various formsincluding, but not limited to, a client, a server, a network device, amobile device, a laptop computer, a desktop computer, a workstationcomputer, a personal digital assistant, a blade server, a mainframecomputer, and other types of computers. The computing device describedbelow and its components, including their connections, relationships,and functions, is meant to be exemplary only, and not meant to limitimplementations of the disclosed technologies described in thisspecification. Other computing devices suitable for implementing thedisclosed technologies of the present invention may have differentcomponents, including components with different connections,relationships, and functions.

FIG. 1 is a block diagram that illustrates an example of a computingdevice 100 suitable for implementing the disclosed technologies.Computing device 100 includes a bus 102 or other communication mechanismfor communicating information, and a hardware processor 104 coupled withbus 102 for processing information. Hardware processor 104 may be, forexample, a general purpose microprocessor or a system on a chip (SoC).

Computing device 100 also includes a main memory 106, such as a randomaccess memory (RAM) or other dynamic storage device, coupled to bus 102for storing information and instructions to be executed by processor104. Main memory 106 also may be used for storing temporary variables orother intermediate information during execution of instructions to beexecuted by processor 104. Such instructions, when stored innon-transitory storage media accessible to processor 104, rendercomputing device 100 into a special-purpose computing device that iscustomized to perform the operations specified in the instructions.

Computing device 100 further includes a read only memory (ROM) 108 orother static storage device coupled to bus 102 for storing staticinformation and instructions for processor 104.

A storage device 110, such as a magnetic disk, optical disk, orsolid-state drive is provided and coupled to bus 102 for storinginformation and instructions.

Computing device 100 may be coupled via bus 102 to a display 112, suchas a liquid crystal display (LCD) or other electronic visual display,for displaying information to a computer user. Display 112 may also be atouch-sensitive display for communicating touch gesture (e.g., finger orstylus) input to processor 104.

An input device 114, including alphanumeric and other keys, is coupledto bus 102 for communicating information and command selections toprocessor 104.

Another type of user input device is cursor control 116, such as amouse, a trackball, or cursor direction keys for communicating directioninformation and command selections to processor 104 and for controllingcursor movement on display 112. This input device typically has twodegrees of freedom in two axes, a first axis (e.g., x) and a second axis(e.g., y), that allows the device to specify positions in a plane.

Computing device 100 may implement the methods described herein usingcustomized hard-wired logic, one or more application-specific integratedcircuits (ASICs), one or more field-programmable gate arrays (FPGAs),firmware, or program logic which, in combination with the computingdevice, causes or programs computing device 100 to be a special-purposemachine.

Methods disclosed herein may also be performed by computing device 100in response to processor 104 executing one or more sequences of one ormore instructions contained in main memory 106. Such instructions may beread into main memory 106 from another storage medium, such as storagedevice 110. Execution of the sequences of instructions contained in mainmemory 106 causes processor 104 to perform the process steps describedherein. In alternative embodiments, hard-wired circuitry may be used inplace of or in combination with software instructions.

The term “storage media” as used herein refers to any non-transitorymedia that store data and/or instructions that cause a computing deviceto operate in a specific fashion. Such storage media may comprisenon-volatile media and/or volatile media. Non-volatile media includes,for example, optical disks, magnetic disks, or solid-state drives, suchas storage device 610. Volatile media includes dynamic memory, such asmain memory 106. Common forms of storage media include, for example, afloppy disk, a flexible disk, hard disk, solid-state drive, magnetictape, or any other magnetic data storage medium, a CD-ROM, any otheroptical data storage medium, any physical medium with patterns of holes,a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip orcartridge.

Storage media is distinct from but may be used in conjunction withtransmission media. Transmission media participates in transferringinformation between storage media. For example, transmission mediaincludes coaxial cables, copper wire and fiber optics, including thewires that comprise bus 102. Transmission media can also take the formof acoustic or light waves, such as those generated during radio-waveand infra-red data communications.

Various forms of media may be involved in carrying one or more sequencesof one or more instructions to processor 104 for execution. For example,the instructions may initially be carried on a magnetic disk orsolid-state drive of a remote computer. The remote computer can load theinstructions into its dynamic memory and send the instructions over atelephone line using a modem. A modem local to computing device 600 canreceive the data on the telephone line and use an infra-red transmitterto convert the data to an infra-red signal. An infra-red detector canreceive the data carried in the infra-red signal and appropriatecircuitry can place the data on bus 102. Bus 102 carries the data tomain memory 106, from which processor 104 retrieves and executes theinstructions. The instructions received by main memory 106 mayoptionally be stored on storage device 110 either before or afterexecution by processor 104.

Computing device 100 also includes a communication interface 118 coupledto bus 102. Communication interface 118 provides a two-way datacommunication coupling to a network link 120 that is connected to alocal network 122. For example, communication interface 118 may be anintegrated services digital network (ISDN) card, cable modem, satellitemodem, or a modem to provide a data communication connection to acorresponding type of telephone line. As another example, communicationinterface 118 may be a local area network (LAN) card to provide a datacommunication connection to a compatible LAN. Wireless links may also beimplemented. In any such implementation, communication interface 118sends and receives electrical, electromagnetic or optical signals thatcarry digital data streams representing various types of information.

Network link 120 typically provides data communication through one ormore networks to other data devices. For example, network link 120 mayprovide a connection through local network 122 to a host computer 124 orto data equipment operated by an Internet Service Provider (ISP) 126.ISP 126 in turn provides data communication services through the worldwide packet data communication network now commonly referred to as the“Internet” 128. Local network 122 and Internet 128 both use electrical,electromagnetic or optical signals that carry digital data streams. Thesignals through the various networks and the signals on network link 120and through communication interface 118, which carry the digital data toand from computing device 100, are example forms of transmission media.

Computing device 100 can send messages and receive data, includingprogram code, through the network(s), network link 120 and communicationinterface 118. In the Internet example, a server 130 might transmit arequested code for an application program through Internet 128, ISP 126,local network 122 and communication interface 118.

The received code may be executed by processor 104 as it is received,and/or stored in storage device 110, or other non-volatile storage forlater execution.

1.2 Online Disitributed Computer System

While the disclosed technologies may operate within a single standalonecomputing device (e.g., device 100 of FIG. 1), the disclosedtechnologies may be implemented in an online distributed computersystem. FIG. 2 is a block diagram of an example online distributedcomputer system 200 in which the disclosed technologies may beimplemented.

As shown, system 200 is provided for hosting an Internet service. System200, which may be distributed across one or more data center or hostingfacilities, includes server computers 202. Server computers 202constitute the hardware layer of system 200. Servers 202 provide thecomputer hardware (e.g., processors 104, memory 106, storage devices110, and communication interfaces 118) for executing software and forcommunicating with other computers over data networks.

Typically, servers 202 within a data center and across data centers willbe communicatively coupled to one another by Internet Protocol (IP) datanetworks. IP is the principal network communications protocol in theInternet protocol suite for relaying data packets within and acrossnetwork boundaries.

Some of the servers 202 are also communicatively coupled to clientdevices 212 by the Internet 210, which is also an IP data network. Theseservers, sometimes referred to as “edge servers” because of theirnetwork proximity to the Internet 210 relative to other servers of thesystem, can receive network requests from and return network responsesto the client devices 212. Typically, the network requests from theclient devices 212 are made according to application level protocol ofthe Internet protocol suite such as, for example, the Hyper-TextTransfer Protocol (HTTP), or a cryptographically secure variant thereof(e.g., HTTPS). Often, the client devices 212 are end-user devices ofusers of the Internet service. For example, client devices 212 mayinclude laptop computers, desktop computers, cell phone computers, smartphone computers, tablet computers, set-top computers, gaming computers,and so forth.

A “virtualization” layer is provided on top of the hardware layer byvirtual machine monitors (hypervisors) 204 that execute on the servers(host machines) 202. Virtual machine monitors 204 instantiate and runvirtual machine instances (guest machines) 206. Each instance 206comprises a “guest” operating system and one or more applications 208designed to execute on the guest operating system. A virtual machinemonitor 204 presents the guest operating systems of the instances 206with a virtual operating platform and manages the execution of the guestoperating systems.

In some instances, a virtual machine monitor 204 may allow a guestoperating system to run as through it is running on the hardware andCPUs of a server 202 directly. In these instances, the same version ofthe guest operating system configured to execute on a server 202directly may also be able to execute on a virtual machine monitor 204without modification or reconfiguration. In other words, a virtualmachine monitor 204 may provide full hardware and CPU virtualization toa guest operating system in some instances.

In other instances, a guest operating system may be specially designedor configured to execute on a virtual machine monitor 204 forefficiency. In these instances, the guest operating system is “aware”that it executes on a virtual machine monitor. In other words, a virtualmachine monitor 204 may provide para-virtualization to a guest operatingsystem in some instances.

A guest operating system is typically provided for controlling theoperation of the virtual machine instance 206 it is executing on. Theguest operating system, which is usually stored in main memory 106 andon fixed storage (e.g., hard disk) 110, manages low-level aspects ofinstance 206 operation, including managing execution of processes,memory allocation, file and network input and output (I/O), and deviceI/O. The guest operating system can be provided by a conventionaloperating system such as, for example, MICROSOFT WINDOWS, SUN SOLARIS,LINUX, UNIX, IOS, ANDROID, and so forth.

One or more applications 208, such as server software, daemons,“programs”, or set of processor-executable instructions, may also beprovided for execution on instances 206. The application(s) may be“loaded” into main memory 106 from storage 110 or may be downloaded froma network location (e.g., an Internet web server). A graphical userinterface (GUI) is typically provided for receiving user commands anddata in a graphical (e.g., “point-and-click” or “touch gesture”)fashion. In addition or alternatively, a command line interface may beprovided. These inputs, in turn, may be acted upon by the instance 206in accordance with instructions from guest operating system and/orapplication(s) 208. The graphical user interface also serves to displaythe results of operation from the guest operating system andapplication(s) 208. Applications 208 may implement various functionalityof the Internet service including, but not limited to, web serverfunctionality, application server functionality, database serverfunctionality, analytic functionality, indexing functionality, datawarehousing functionality, reporting functionality, messagingfunctionality, and so forth.

2.0 Distributed Tracing Technology System Components

The disclosed technologies provide distributed trace aggregation andtargeted distributed tracing in an online distributed computer system.The distributed trace aggregation and targeted distributed tracingtechnologies can be used individually or together. Thus, there is norequirement of the disclosed technologies that if one of thetechnologies is used the other must be used also.

Nonetheless, the distributed trace aggregation and targeted distributedtracing technologies are built on a number of underlying distributedtracing technologies. Accordingly, the underlying distributed tracingtechnologies will be described first, followed by descriptions of thetechnologies particular to distributed trace aggregation and targeteddistributed tracing.

2.1 Request Paths

An edge server in an online distributed computer system that receives arequest from a client device may distribute the request to multipleother servers in the system. Each of those servers may in turndistribute the requests they receive from the edge server to still otherservers, and so on. In other words, the single request from the clientdevice to edge server may cause a “fan-out” of multiple levels ofmultiple requests within the system.

For example, a front-end web server of an Internet streaming videoservice may distribute a request from a user's client device foravailable videos to a number of other sub-servers that generatepersonalized video recommendations, determine the user's geographiclocation, retrieve video box-art graphics and images, and so forth.Results from all of the sub-servers may then be selectively combined ina web page or other response returned to the client device. In total,many servers and applications may be needed to process a single userrequest.

More formally, a single network request from an “initiator” can cause anumber of “interprocess communication calls” between “nodes” of theonline distributed computer system. In other words, the request can havea path through the system starting at the edge node that initiallyreceives the request and traversing one or more other nodes of thesystem via the interprocess communication calls.

A used herein, the term “node”, in the context of an online distributedcomputer system, refers to any of an executing instance of anapplication (e.g., 208), a group or cluster of multiple applicationinstances, a virtual machine instance (e.g., 206), a group or cluster ofmultiple virtual machine instances, a server (e.g., 202), a group orcluster of servers, or some combination of the foregoing. Nodes may beseparately identifiable by unique identifiers (e.g., names) which theapplication(s) of the system may be configured with. For example, theapplication(s) may be pre-configured with node identifiers or may be soconfigured at runtime. Thus, each executing application instance may beaware of the unique identifier or name of the node it belongs to, orthat it is, if the application instance is a node. In this description,an identifier or a name of a node is referred to herein as a “nodeidentifier”, “node name”, or “node id”.

The “initiator” is typically a client computing device of a user of theInternet service that sends the request to the Internet service inresponse to some interaction by the user with the client device (e.g.,clicking a link on a web page). However, the initiator can be a servercomputing device or an unattended computing device that sends requeststo the service autonomously.

An “interprocess communication call”, or just “IPC call” for short, istypically made from one node to another node. The IPC call is typicallymade over an IP data network and according to application-level IPnetworking protocol such as, for example, HTTP. An IPC call is typicallyassociated with a calling node that formulates and sends the IPC call toa callee node. Upon receiving the IPC call, the callee node processesthe IPC call and formulates and sends an IPC reply back to the callingnode. Each of the IPC call and the IPC reply may comprise one or more IPdata packets. Typically, the IPC call requests information from thecallee node and the IPC reply provides the requested information to thecalling node. However, an IPC call can also be used to provideinformation to the callee node. Accordingly, an IPC reply may simply bean acknowledgement that the information was received.

For example, FIG. 3 illustrates a simple example of the fan-out of arequest path through nodes of an online distributed computer system 200on behalf of a request 302 from an initiator. As shown, the request 302is sent from the initiator over the Internet 210 and received at an edgenode 304A of the online distributed computer system 200. This causesedge node 304A to make two interprocess communication calls, one to node304B and the other to node 304C. The IPC call from edge node 304A tonode 304C causes node 304C in turn to make two more interprocesscommunication calls, one to node 304D and the other to node 304E. Afterthe edge node 304A has received the IPC call replies from nodes 304B and304C, the edge node 304A prepares a response 306 based on the repliesand sends it to the initiator. Note that edge node 304A may not receivean IPC call reply from node 304C until after nodes 304D and 304E havereplied to node 304C.

One simple way to measure the performance of the system 200 from theend-user perspective is to measure the amount of time between when therequest 302 is fully received at the edge node 304A and the response 306is fully sent from the edge node 304A. If this time is too long, it mayindicate that there is problem in the system 200. While this simplemeasurement may indicate problematic system performance issues, it doesnot by itself tell which node(s) 304 are causing the poor performance,or even which nodes are in the request path. Thus, engineers anddevelopers would appreciate technologies that provide them with thetools to more effectively diagnose and identify the root cause of poorsystem performance.

2.2 Distributed Tracing Instrumentation Points

One way to capture information about the request path through the systemfor a given initiator request is to generate a trace event whenever anapplication instance a) sends an IPC call, b) receives an IPC call, c)sends an IPC reply, or d) receives an IPC reply. The request path canthen be reconstructed from the trace events generated by the applicationinstances caused by processing the initiator's request.

To do this reconstruction, unique “trace identifiers” are used. Inparticular, the trace identifiers can be assigned to initiator requestsat the edge nodes that receive the initiator requests. Then, the traceevents caused by application instances processing an initiator requestcan be generated to include the trace identifier assigned to theinitiator request. The trace identifiers in the trace events can then beused to associate the trace events with the particular initiatorrequests that caused them to be generated. Trace identifiers and otherdistributed tracing metadata may be propagated between applicationinstances in IPC calls.

These distributed tracing functions may be facilitated throughinstrumentation at select points of a standard or core softwarelibraries used by the applications that provide basic IPC, threading,and initiator request/response handling functionality to higher-levelfunctionality of the applications. In this context, instrumentationrefers to specific software programming language instructions designedto facilitate distributed trace logging such as, for example, bycomposing and generating a trace event. By instrumenting these basiclibraries, the distributed tracing can be transparent to thehigher-level functionality. In particular, developers of thehigher-level functionality may not need to concern themselves withenabling distributed tracing in the higher-level functionality orotherwise concern themselves with how distributed tracing isaccomplished. Simply by using the standard software library as abuilding block for the higher-level functionality in an application,distributed tracing is enabled.

FIG. 4 is a block diagram showing an example of how distributed tracingmay be facilitated through instrumentation of a standard softwarelibrary used by applications in an online distributed computer system.In this example, system 200 comprises executing application instance A,executing application instance B, and executing application instance C.Some or all of the applications executing in the system may be builtupon a standard software library that provide lower-level functionalityto the higher-level functionality (business logic) of the applications.For example, executing application instance A comprising high-levelapplication logic A uses standard software library 402, executingapplication instance B comprising high-level application logic B alsouses standard software library 402, and executing application instance Ccomprising high-level application logic C also uses standard softwarelibrary 402.

Typically, the standard software library acts as a software “cushion”for the applications between the high-level application logic and theguest operating system that allows the developers of the high-levelapplication logic to reason about and develop the high-level applicationfunctionality without having to be overly concerned about theparticulars of the underlying guest operating system(s) the applicationsexecute on. For example, standard library 402 may provide IPC, threadmanagement, and initiator request/response handling services to thehigh-level application logic A.

The standard library used by the application can be instrumented atselect execution points to facilitate distributed tracing. Inparticular, the standard library can be instrumented to generate atleast one trace event and possibly perform other distributed tracingfunctions at some or all of the following execution points:

-   -   1. At any edge node of the online distributed system, when a        network request is received from an initiator;    -   2. At any edge node, when a response to a network request is        sent to an initiator;    -   3. At any calling node, when a IPC call is sent from the calling        node to a callee node;    -   4. At any callee node, when an IPC call is received from a        calling node;    -   5. At any callee node, when an IPC reply is sent from the callee        node to the calling node; and    -   6. At any calling node, when an IPC reply is received at the        calling node from a callee node.

For example, in the request path of FIG. 4, standard library 402 can beconfigured to generate a trace event at each of execution points 408,410, 412, 414, 416, 418, 420, 422, 424, and 426. An example trace eventschema to which generated trace events may conform is presented in thenext section.

In the current example, execution point 408 corresponds to item 1 above:when a network request is received at an edge node from an initiator.Execution point 426 corresponds to item 2 above: when a response to anetwork request is sent from an edge node to an initiator. Executionpoints 410 and 414 correspond to item 3 above: when an IPC call is sentfrom a calling node to a callee node. Execution points 412 and 416represent item 4 above: when an IPC call is received from a callingnode. Execution points 418 and 422 correspond to item 5 above: when anIPC reply is sent from a callee node to a calling node. And executionpoints 420 and 424 correspond to item 6 above: when an IPC reply isreceived at a calling node from a callee node. The distributed tracingactions performed at these various execution points will now bediscussed in greater detail.

1. At any edge node of the online distributed system, when a networkrequest is received from an initiator.

When a network request from an initiator is received at an edge node,the standard library of the edge node may be configured to generate andassign an identifier to the initiator's request. This assignedidentifier is referred to herein as a “trace identifier”.

A trace identifier assigned to an initiator's request uniquelyidentifies the request and the consequent request path through the nodesof the online distributed system. For example, in addition to generatinga trace event at execution point 408, standard library 402 can also beconfigured to generate and assign at execution point 408 a traceidentifier to the incoming network request from the initiator. In thecontext of trace identifiers, unique means probabilistically uniqueamong all trace identifiers assigned to initiator requests within agiven period of time. For example, the trace identifier may be a 64-bitpseudo-randomly generated number. Although a true random numbergenerator may be used instead.

Other distributed tracing metadata in addition to the trace identifiermay be generated by the standard libraries of the nodes. In particular,two additional pieces of distributed tracing metadata may be generated.

A first piece is referred to herein as a “span identifier”. In thecontext of a request path as identified by it assigned trace identifier,a span identifier is an identifier of a “span” in the request path. Aspan identifier uniquely identifies a span at least among all spans in arequest path identified by a trace identifier. Thus, the combination ofa trace identifier and a span identifier uniquely identifies a span. Asused herein, the term “span” refers to the computational work in termsof amount of data and/or processing time performed by a node processinga single request from an initiator, if the node is an edge node, orprocessing a single IPC call from a calling node, if the node is acallee node.

A second piece of distributed tracing metadata is referred to herein asa “parent span identifier”. Relative to a callee node in a request pathas identified by its assigned trace identifier, a parent span identifieris the span identifier of the calling node in the request path.

Assigned span identifiers may be propagated between nodes in a requestpath in IPC calls between the nodes. For example, referring again toFIG. 4, at execution point 408, standard library 402 of applicationinstance A may assign a span identifier in addition to generating andassigning a trace identifier to the initiator request.

A trace event may also be generated at execution point 408 with thetrace identifier of the current request path, no parent span identifier(which indicates that the node is an edge node), and a span identifierof the edge node. Alternatively, the trace event may include a parenttrace identifier of zero (0) or NULL or some other predefined value toindicate that the node is an edge node and does not have a parent span.

The trace identifier and the span identifier form part of the “tracecontext”. When an application instance handles a traced execution path(i.e., a traced thread of execution), the trace context can bepropagated within the application instance in thread-local storage. Forexample, at execution point 408, standard library 402 of applicationinstance A may store a trace context in thread-local storage comprisingthe assigned trace identifier and the assigned span identifier. By doingso, the current trace context can be retrieved from thread-local storageat other execution points along the traced execution path. For example,a trace context added to thread local storage at execution point 408 canbe retrieved from thread local storage at execution points 410, 424, and426.

In the situation where the traced execution path is asynchronous ordeferred involving a callback between different threads of executionwithin the application instance, a callback handler module in thestandard library for facilitating the callback can retrieve the tracecontext from the thread-local storage of the calling thread and storethe trace context in the thread-local storage of the called thread whenthe callback is invoked.

2. At any edge node, when a response to a network request is sent to aninitiator.

When a response to an initiator's request is sent from an edge node, thestandard library of the edge node can generate a trace event pertainingto this event. For example, standard library 402 of application instanceA can generate a trace event at execution point 426. The generated traceevent can include the trace identifier of the current request path, noparent span identifier (or a parent span identifier of zero or NULL orsome other predefined value to indicate there is no parent span), andthe span identifier of the edge node.

3. At any calling node, when a IPC call is sent from the calling node toa callee node.

When a calling node sends an IPC call to a callee node, in addition togenerating a trace event, the calling node may provide in the IPC callto the callee node the trace identifier of the current request path andthe span identifier of the calling node. For example, at execution point410, the standard library 402 of application instance A may include thetrace identifier and the span identifier generated at execution point408 in the outgoing IPC call to application instance B. An analogousoperation may be performed by the standard library 404 of applicationinstance B at execution point 414 in the IPC call to applicationinstance C. The generated trace event may include the trace identifierof the current request path, a parent span identifier, if appropriate(i.e., the span identifier of the node that called the calling node, ifany), and the span identifier of the calling node.

4. At any callee node, when an IPC call is received from a calling node.

When a callee node receives an IPC call from a calling node, in additionto generating a trace event, the callee node may generate a spanidentifier for itself. The span identifier received in the IPC call isthe span identifier of the calling node. In other words, the spanidentifier received in the IPC call is the parent span identifier forthe callee node.

For example, at execution point 410, the standard library 402 ofapplication instance A may send an IPC call that include the traceidentifier and the span identifier that were assigned to the requestpath and the edge node, respectively, at execution point 408. When theIPC call is received at application instance B at execution point 412,the standard library 402 of application instance B can obtain the traceidentifier and the span identifier of the calling node from the IPCcall. Also at execution point 412, a trace context may be generatedcomprising a) the trace identifier obtained from the IPC call, b) theparent span identifier obtained from the IPC call, and c) the spanidentifier generated for the callee node. The generated span identifiercan be generated at the callee node and should be unique at least withinthe current trace. For example, the callee node may generate a spanidentifier for itself by incrementing the span identifier received inthe IPC call from the calling node by a fixed amount (e.g., one (1)).Once generated, the callee node can add the trace context tothread-local storage for propagation to other execution points 414, 420,and 422 within application instance B.

Also at execution point 412, standard library 402 of applicationinstance B may generate a trace event comprising the trace identifier ofthe current request path, the parent span identifier of the callingnode, and the span identifier of the callee node.

Operations analogous to those described above performed at executionpoint 412 may also be performed at execution point 416 of standardlibrary 402 when application instance C receives the IPC call fromapplication instance B. At execution point 416, the parent spanidentifier is the span identifier generated at execution point 412 bystandard library 402 of application instance B and propagated in the IPCcall to application instance C at execution point 414.

5. At any callee node, when an IPC reply is sent from the callee node tothe calling node.

When a callee node sends an IPC reply to a calling node, the standardlibrary of the callee node can generate a trace event pertaining to thisevent. For example, standard library 402 of application instance C cangenerate a trace event at instrumentation point 418. The generated traceevent can include the trace identifier of the current request path, b)the parent span identifier of the calling node, and c) the spanidentifier of the callee node. Analogously, a trace event may also begenerated at execution point 422 when application instance B sends anIPC reply to application instance A.

6. At any calling node, when an IPC reply is received at the callingnode from a callee node.

When a calling node receives an IPC reply from a callee node, thestandard library of the calling node can generate a trace eventpertaining to this event. For example, standard library 402 ofapplication instance B can generate a trace event at execution point420. The generated trace event can include the trace identifier of thecurrent request path, the parent span identifier, if appropriate, andthe span identifier of the calling node. Analogously, a trace event mayalso be generated at instrumentation point 424 when application instanceA receives an IPC reply from application instance B. In this case, theremay be no parent span identifier in the generated trace event (or theparent span identifier may be zero or NULL or some other value toindicate that the calling node does not have a parent).

2.3 Trace Event Schema

Trace events (messages) generated by the standard libraries ofapplication instances may conform to a trace event schema. Generally,the schema of a trace event is a set of name-value pairs or properties.Some of the properties are required. Others properties are optional.

Table 1 below provides an example trace event schema with anon-exclusive list of possible properties. Other properties other thanthose listed may be included in a trace event that conforms to theexample schema. The first (leftmost) column lists the names ofproperties. The second column provides a short description of the valuesof the properties. The third column indicates which properties arerequired and which are optional. The fourth (rightmost) column listswhich types of trace events the properties may be included in.

TABLE 1 Example Trace Event Schema Property Trace Name Short DescriptionOptional? Event Type trace_id Globally unique identifier of an No Allinitiator request for which the trace event is generated. The trace_idalso uniquely identifies the request path through the online distributedcomputer system resulting from the initiator request. parent_ Anidentifier of the calling node, Yes All span_id if any, that called thenode that generated the trace event. If the node that generates thetrace event is an edge node, the parent_span_id property may be absent.Alternatively, the value of the parent_span_id property may zero or NULLor some other predefined value to indicate that the node that generatesthe trace event is an edge node. span_id An identifier of the span thatNo All generated the trace event. node_id Name or identifier of the nodeYes All that generated the trace event. This may be, for example, anassigned service name, an application name, a cluster name, server name,an auto scaling group name, etc. trace_ The type of the trace event YesAll msg_type indicating the event for which the trace event isgenerated. Can be one of a number of predefined types including, but notlimited to: 1. request_recv-for when a  request is received at an edge node from an initiator. 2. response_sent-for when a  response to arequest is sent  from an edge node to an  initiator. 3. call_sent-forwhen an IPC  call is sent from a calling  node to a callee node. 4.call_recv-for when an IPC  call from a calling node is  received at acallee node. 5. reply_sent-for when an IPC  call reply is sent from acallee  node to a calling node. 6. reply_recv-for when an IPC  callreply from a callee node  is received at a calling node. data Any text,error, or binary Yes All information particular to the trace event.error_code Any application specific error Yes response_ code (e.g., anHTTP error code) sent sent in a response to an initiator or or in an IPCcall reply to a reply_sent calling node. client_side_ Total time betweenwhen the Yes reply_recv total_time node that generated the trace eventsends an IPC call and when the node receives the reply to the IPC call.server_side_ Total time between when the Yes reply_sent total_time nodethat generated the trace event receives an IPC call from a calling nodeand when the node sends an IPC reply to the calling node. If the nodethat generated the trace event is an edge node, then this is the totaltime between when the edge node receives a request from an initiator andwhen the edge node sends the response to the request to the initiator.

According to the above example trace event schema, the trace_id andspan_id properties are required in a trace event. The node_id andparent_span_id properties are optional.

The optional error_code property can be included in trace eventsgenerated when an edge node sends a response to an initiator or when acallee node sends an IPC call reply to a calling node. For example, theerror_code property may be used to store in the trace event the value ofan HTTP status error code associated with the request or reply such asan HTTP status error code of 400 or above.

The optional client_side_total_time property can be included in traceevents generated when a calling node receives an IPC call reply from acallee node. For example, the client_side_total_time property may beused to store in the trace event a stopwatch time between when thecalling node sent the IPC call to the callee node and when the callingnode received the IPC call reply from the callee node.

The optional server_side_total_time property can be included in traceevents generated when an edge node receives a request from an initiatoror when a callee node receives an IPC call from a calling node. Forexample, the server_side_total_time property may be used to store in thetrace event a stopwatch time between when the edge node/callee nodereceives the request/IPC call and when the edge node/callee node sendsthe response/IPC call reply.

In addition to or as an alternative to the client_side_total_time andthe server_side total_time properties, trace event timestamps may beused to calculate the time spent by a calling node between sending anIPC call and receiving an IPC reply and the time spent by a callee nodeprocessing a received IPC call. In particular, a current date/timetimestamp can be included in generated trace events. In this case, thetime spent by a calling node between sending an IPC call and receivingan IPC reply can be computed as the difference between the timestamp inthe trace event generated when the calling node receives the IPC replyand the timestamp in the trace event generated when the calling nodesends the IPC call. Similarly, the time spent by a callee node betweenreceiving an IPC call and sending an IPC reply can be computed as thedifference between the timestamps in the two corresponding trace eventsgenerated by the callee node. In both cases there may be no need toaccount for clock drift as the difference computation may involvetimestamps generated relative to the same system clock.

The above example schema lists just some of the possible properties thatmay be included in a trace event. Other properties in addition to orinstead of those listed above are possible. For example, other requiredand optional properties may be included in a trace event. Further,properties in trace event may be named differently according to therequirements of the particular implementation at hand and the names ofthe properties listed above are merely exemplary and not limiting of thepossible trace event schemas that are compatible with the disclosedtechnologies.

2.4 Spans

Information in generated trace events can be used to reconstruct requestpaths. More specifically, the values of the trace_id, parent_span_id,and span_id properties in a set of trace events can be used. The subsetof a set of trace events that pertain to a particular initiator requestpath all have the same value for the trace_id property. Within thissubset, trace events generated by the edge span in the request path canbe identified by the absence of the parent_span_id property or,alternatively, the presence of the parent_span_id property having apredefined value indicating that the trace events were generated by theedge span such as, for example, zero (0) or NULL. The trace eventsgenerated by “child” spans called by the edge span, if any, can beidentified in the subset as having a value for the parent_span_idproperty that equals the value of the span_id property identified in thetrace events generated by the edge span. Steps analogous to this stepmay be repeated in a depth-first or breadth-first manner for each of thechild, grandchildren, great-grandchildren, etc. spans of the edge span,if there are any, until all trace events in the subset have beenaccounted for. When complete, all spans of the request path tree andtheir IPC call dependencies will have been identified.

For example, FIG. 5 illustrates an example request path tree 500 thatcan be reconstructed from trace events. This example is based on theexample request path illustrated in FIG. 3 and assumes at least thefollowing trace events are generated for the request path. The values inthe trace event number column are for reference and may not actually beincluded in the trace events. Other values are intentionally omitted forclarity.

TABLE 2 Example Trace events Trace event # trace_id parent_span_idspan_id 1 77562efa8f141c07 0 1 2 77562efa8f141c07 1 2 3 77562efa8f141c071 3 4 77562efa8f141c07 3 4 5 77562efa8f141c07 3 5

As show in FIG. 5, request path tree 500 has five (5) spans. Span 1 hastwo direct IPC call dependencies, one on span 2 and one 5. Note that thevalue of the trace_id property for all of the message is the sameindicating that the trace events were all caused by the same initiatorrequest and are part of the same request path.

Interesting performance metric information from the trace events can beassociated with the spans when reconstructing the request path tree fromthe trace events. In particular, the values of the error_code,client_side_total_time, and the server_side_total_time properties in thetrace events can be associated with the spans. For example, informationin the following example trace events may be used to associateinteresting performance metric information with the spans of the examplerequest path tree 500. The values in the trace event number column arefor reference and may not actually be included in the trace events.Further, the values in the trace event number column do not necessarilycorrespond to the values in the trace event number of column of Table 2above. Other values are intentionally omitted for clarity.

TABLE 3 Example Trace Events Trace client_side_ server_side_ eventerror_ total time _ total_ time # span_id type code (milliseconds)(milliseconds) 1 1 reply_recv — 55 — (from span 2) 2 1 reply_recv — 825— (from span 3) 3 1 response_sent 200 OK — 900 (to initiator) 4 2reply_sent 200 OK — 50 (to span 1) 5 3 reply_sent 200 OK — 815 (tospan 1) 6 3 reply_recv — 775 — (from span 4) 7 3 reply_recv — 27 — (fromspan 5) 8 4 reply_sent 200 OK — 700 (to span 3) 9 5 reply_sent 200 OK —20 (to span 3)

From these trace events, it can be determined that it took approximately900 milliseconds for span 1 to handle the initiator's request. It canalso be determined that most of the time spent handling the request wasby span 4 which took approximately 700 milliseconds to handle the IPCcall from span 3. In this example, the error_code values returned in theIPC call replies from spans 2, 3, 4, and 5 and in the response from span1 were all HTTP status codes of 200 OK. Alternatively, the error_codevalues can be from other application-level protocols (e.g., SMTP) if anapplication-level protocol other than HTTP is used for interprocesscommunication between nodes. Even if HTTP is used for interprocesscommunication, the error_code values could be other than 200 OK. Forexample, if an error occurred in span 4 handling the IPC call from node3, the error_code value could be, for example, 500 SERVER ERROR insteadof 200 OK.

3.0 Distributed Trace Aggregation

With the above distributed tracing technologies in mind, some furtherdistributed trace aggregation technologies will now be described. Asmentioned above, the distributed trace aggregation technologies make iteasier for developers, engineers, and other system technicians tosupport and maintain an online distributed computer system.

In an embodiment, the distributed trace aggregation technologies includea computer-implemented method performed by one or more computing devicesfor distributed trace aggregation. FIG. 6 is a flowchart 600illustrating the overall operation of distributed trace aggregation inan online distributed computer system according to an embodiment of thedisclosed technologies. As to the flowchart 600, each block within theflowchart represents both a method step and an apparatus element forperforming the method step. Depending upon the implementation, thecorresponding apparatus element may be configured in hardware, software,firmware or combinations thereof.

3.1 Generating Trace Events

At step 602, trace events are generated at a plurality of system nodesof the online distributed computer system for a plurality of requestpaths. Each of the request paths may correspond to an initiator requestreceived at an edge node of the online system. Each of the request pathsmay traverse one or more nodes of the online system before a response isreturned from the edge node to the initiator. Each trace event isgenerated for a corresponding one of the request paths and for acorresponding span of the corresponding request path. For example, for agiven request path, a trace event may generated at any of the followingtimes: 1) at the edge node in the request path when the initiatorrequest is received, 2) at the edge node in the request path, when theresponse to the initiator request is sent to the initiator, ) at eachcalling node in the request path, when the calling node sends an IPCcall to a callee node, 4) at each callee node, when an IPC call isreceived from a calling node, 5) at each callee node, when an IPC replyis sent to a calling node, and 6) at each calling node, when an IPCreply is received from a callee node.

In some embodiments, trace events are generated at system nodes bystandard libraries of executing application instances at predefinedexecution points. The execution points are predefined in the sense thatthe standard libraries used by application instances are configured ordesigned to generate the trace events at the execution points when theapplication instances are executed. For example, referring again to FIG.4, trace events can be generated by standard library 402 at executionpoints 408, 410, 412, 414, 416, 418, 420, 422, 424, and 426.

Generating a trace event includes generating and/or collecting andstoring trace event data in computer memory (e.g., RAM 106). The traceevent data may comprise, for example, any data conforming to the traceevent schema described above. Collecting trace event data may includeretrieving one or more of a trace identifier, a parent span identifier,and a span identifier from thread local storage.

3.2 Collecting Trace Events

At step 604, trace events generated at system nodes are collected. In anembodiment, the trace events are collected during a sample period, whichmay include a continuous period of time or multiple discontinuousperiods of time. For example, the sample period may correspond to aperiod of time such as, for instance, a twenty-four hour period of timeor other suitable length of time for collecting a sufficient number oftrace events for the purpose of computing span metric aggregates. In anembodiment, the sample period is user configurable, for instance,through a graphical user interface, a command line interface, aconfiguration file, or other computer interface. Instead of a period oftime, the sample period may be defined by other criterion such as anumber of trace events or based on user input. In the case of userinput, the start of the sample period may correspond to first user inputthat indicates when to begin collecting trace events and second userinput that indicates when to stop collecting trace events. For example,a user interface may be provided with VCR-like controls that allow auser to begin “recording” (collecting) trace events for a sample period,pause collection of trace events during the sample period, resume(un-pause) collection of trace events during the sample period, and stopcollecting trace events for the sample period.

Collecting trace events may include persistently storing trace events ina database. FIG. 7 illustrates an example relation 702 for collectingtrace events in a database. In relation 702, the rows correspond torequest paths and the columns correspond to spans.

In an embodiment, a collected trace event (or information thereof) isstored in one cell of relation 702 based on the value of the trace_idproperty and the value of the span_id property of the trace event. Somecells of relation 702 may be empty (i.e., not store any trace eventinformation) if no trace event with a trace_id property value and aspan_id property value corresponding to the row and column of the cell,respectively, has been collected. When a trace event is collected from asystem node, the value of the trace_id property in the trace event canbe used to determine which row in relation 702 information in the traceevent is to be stored. If the row does not yet exist, the row is addedusing the value of the trace_id property in the trace event as the rowkey. The value of the span_id property in the trace event determineswhich column of the row (i.e., cell) the trace event information isstored. If the column does not yet exist, the column is added using thevalue of the span_id property as the column key. Each cell in relation702 may in fact have a list of values, one value for each trace eventreceived with the same trace_id property value and the same span_idproperty value. Each value in the cell's list can include theparent_span_id from the trace event, the value of the error_codeproperty, the value of the client_side_total time property, the value ofthe server_side_total_time property, among other possible information inthe trace event.

In an embodiment, relation 702 is managed by a database system. In anexemplary embodiment, the managing database system is an instance of theAPACHE HBASE database system or an instance of the APACHE CASSANDRAdatabase system. However, other types of database systems may be used tomanage relation 702 such as, for example, a relational databasemanagement system, according to the requirements of the particularimplementation at hand. Thus, the managing database system is notlimited to any particular database system or particular type of databasesystem.

Further, relation 702 is just one example relation for storing collectedtrace events. Other database structures may be used in otherembodiments. For example, collected trace events may be stored inmultiple relations instead of just a single relation.

A typical online service can be receive up to 10,000 initiator requestsper second or more. Given the volume of initiator requests an onlineservice may receive, tracing each initiator request received by theservice may be impractical or undesirable due to the amount trace eventinformation that would be generated. To limit the amount of trace eventinformation generated, only every other N number of requests may betraced. For example, N may be 100. Thus, instead of tracing 10,000initiator requests per second, 100 requests per second are tracedinstead.

The selection of which initiator requests to trace can be made at theedge nodes based on a simple running counter of number of initiatorrequests received. If an initiator request is selected for tracing, thisfact can be communicated to other nodes in the request path so thattrace events are generated only for selected requests. For example, thetrace context stored in thread local storage and sent in IPC calls canhave a field or value indicating whether tracing is enabled for thecurrent request. This field or value can be checked at the variousexecution points in the standard libraries to determine whether or not atrace event should be generated at the execution point.

Even if only a small percentage (e.g., 1%) of all initiator requests aretraced, a significant number of trace events may still need to becollected. For example, if 100 initiator requests per second areselected for tracing and there are on average 2 to 4 nodes in eachrequest path and each node in the request path generates on average 2 to4 trace events per initiator request, then there is on average between400 and 1600 trace events being generated every second.

One way to efficiently collect this kind of trace event volume forcomputing span metric aggregates is to avoid persisting the trace eventas much as possible. Typically, reading and writing data to non-volatiledata storage media (e.g., hard disk) is much slower than reading andwriting data to volatile data storage media (e.g., main memory). Thus,avoid writing trace event to non-volatile media can improve performanceof technologies for collecting the trace events from the nodes that areused in span metric aggregate computation.

One solution to avoid persisting trace events is to use a data pipelineto move trace events generated at the nodes in the online distributedcomputer system to an aggregation engine that computes the span metricaggregates. This is illustrated in FIG. 8.

In particular, FIG. 8 is a block diagram of an online distributedcomputer system that includes a data pipeline 802 to move trace eventsfrom application instances 804 that generate them to consumerapplications 806 that use them including aggregation engine 808 andsearch engine 810.

Trace events generated by application instances 804 are sent by theapplication instances 804 to trace event daemons 812. A trace eventdaemon 812 may execute within the process space of an applicationinstance 804, for example, in a thread of the process space.Alternatively, the daemon 812 may execute in a separate process space.Further, some daemons may execute within the process space of anapplication instance 802 and some daemons 812 may execute in separateprocesses spaces.

Application instances 804 do not need to persist the trace events theygenerate before sending them to daemons 812. Instead, applicationinstances 804 send trace event generated in volatile memory to a daemon812 over a communication channel. Each daemon 812 may be configured withan in-volatile memory queue for storing trace events until they can besent to the data pipeline 802. Thus, daemons 812 also do not need topersist trace events. It should be noted that loss of some of generatedtrace event may be acceptable. Thus, daemons 812 may discard traceevents received from application instances 804 if their in-memory queuesare currently full or above a threshold.

The ratio of daemons 812 to application instances 804 can be as high asone-to-one. However, the ratio can be much less than one according tothe requirements of the particular implementation at hand.

The trace event communication channels between an application instance804 and a daemon 812 can be an interthread communication channel, if theapplication instance 804 and the daemon 812 execute in the same processspace, or an interprocess communication channel, if the applicationinstance 804 and the daemon 812 execute in different processes on thesame instance or on different instances connected by a data network. Itis expected, but not required, that the daemons 812 are connected to thedata pipeline 802 by a data network such as, for example, an IP datanetwork.

Data pipeline 802 processes trace events asynchronously in stages. Eachstage comprises a queue and a pool of worker threads that consume traceevents asynchronously from the queue, processes them, and sends them tothe next stage. The main processing flow of trace events in the datapipeline 802 includes HTTP server(s) 814 that receives trace events sentfrom the daemons 812 via a remote procedure call (RPC) over HTTPmechanism.

The HTTP server(s) 814 pass received trace events onto a message router816. The message router 816 determines which message sink 818 the traceevents should be routed to. Each message sink 818 that receives traceevents sends the messages to a corresponding sink system 820. One of themessage sinks 818 may be for sending trace events to apublication-subscription messaging system 821. The publication-systemmessaging system 821 may employ a distributed commit log forpersistently storing the trace events received from the correspondingmessage sink 818. The publication-subscription messaging system 821 maybe the first place trace events are persisted. That is, trace eventsgenerated at application instances 804 may move from the applicationinstances 804 to the daemons 812 and through the data pipeline 802without being stored in a non-volatile data storage medium until theyare received at the publish-subscription messaging system 821. In thisway, large volumes of trace events can be collected from distributedapplication instances in a timely fashion.

Publication-subscription messaging system 821 may store a backlog ofrecent trace events received from the data pipeline 802. The backlog maycorrespond to a period of time (e.g., the past 48-hours), a certainnumber of trace events, or data storage units consumed by the storedtrace events, and so forth. Publication-subscription messaging system821 publishes stored trace events to consumer applications 806 includingpossibly a search engine 810 and an aggregation engine 808. Searchengine 810 may provide a user interface for indexing, querying andviewing individual trace events. Aggregation engine 808 computesaggregates of spans identified in the trace events and provides a userinterface including a call graph as described in greater detail in thenext section. In an exemplary embodiment, aggregation engine 808comprises an executing instance of the DRUID open-source software forcomputing span metric aggregates.

3.3 Identifying Subsets of Trace Events Pertaining to Nodes

Returning to FIG. 6, at step 606, subsets of collected trace events areidentified for the purpose of computing span metric aggregates. Suchidentification may be based in the trace identifier, span identifier,and parent span identifier in the collected trace events.

A first type of subset that may be identified is all collected traceevents that pertain to a particular request path (i.e., a particularinitiator request). Collected trace events that belong to one of thistype of subset all have the same value for the trace identifier property(e.g., the trace_id property described above). Thus, all collected traceevents that pertain to particular request path can be identified bytheir common trace identified value. A set of trace events identified ina set of collected trace events that pertain to a particular requestpath may be referred to herein as a “request path subset”. One or morerequest path subsets can be identified in a set of collected traceevents based on the trace identifier value in the trace events in theset of collected trace events. For each such request path subset, alltrace events in the request path subset may have the same value for thetrace identifier property.

Within a request path subset of trace events, one or more “span” subsetscan be identified based on the value of the span identifier property(e.g., the span_id property described above) in the trace events in therequest path subset. In particular, all trace events in a request pathsubset that pertain to a particular span of that request path have thesame value for the span identifier property. Thus, all trace events in arequest path subset that pertain to a particular span of that requestpath can be identified by their common span identifier value. One ormore span subsets can be identified in a request path subset of traceevents based on the span identifier value in the trace events in therequest path subset. For each such span subset, all trace events in thespan subset may have the same value for the span identifier property.

A span identifier in a trace event uniquely identifies a span within aparticular request path identified by the trace identifier of the traceevent. However, the span identifier is not required to be unique acrossmultiple request paths. Further, a span identifier for one request pathand a span identifier with the same value for another request path maynot necessarily correspond to the same node in the online system. Thus,a span identifier of a span in a request path is meaningful only in thecontext of that request path.

In an embodiment, to provide more useful span metric aggregates, theaggregates are computed for nodes, which are typically identified bywell-known, human readable node identifiers. Such node identifiers canbe presented in a graphical user interface in conjunction with displayof span metric aggregates computed for the nodes to provide meaningfuland helpful information to a user viewing the graphical user interface.

Accordingly, another type of subset that can be identified is allcollected trace events that pertain to a particular node in the onlinedistributed computer system. To identify this type of subset, spansubsets are resolved to node identifiers. To resolve a span subset to anode identifier, the trace events in the span subset are examined for avalue of the node identifier property (e.g., the node_id propertydescribed above). Typically, all trace events in a span subset that havea node identifier property will have the same value for the nodeidentifier property. However, not all trace events in a span subset mayhave a node identifier property. For example, only one of the traceevents in a span subset may have a node identifier property. Thisresolution may be performed for multiple span subsets across multiplerequest paths. As a result, the trace events in multiple span subsetsacross multiple request paths may be resolved to the same nodeidentifier. Also as a result, multiple node identifiers may each beassociated with a subset of collected trace events that pertain to thatnode. A set of trace events identified in a set of collected traceevents that pertain to a particular node may be referred to herein as a“node subset”. One or more node subsets can be identified in a set ofcollected trace events based on the trace identifier values, the spanidentifier values, and node identifier values in the trace events in theset of collected trace events. For each such node subset, all traceevents in the node subset that have a node identifier property may havethe same value for the node identifier property.

As mentioned above, a span identifier in a trace event may have meaningonly in the context of a particular request path. In other words, a spanidentifier in a trace event may uniquely identify a span in a requestpath (i.e., the request path identified by the trace identifier in thetrace event) that the span is a part of but not in any other requestpath. Similarly, a parent span identifier in a trace event may uniquelyidentify a span in a request path that the span is a part of but not inany other request path. Nonetheless, it can be useful to resolve parentspan identifiers in trace events to node identifiers. For example, inaddition to computing a span metric aggregate for a particular nodesubset of trace events, it may be useful to compute a span metricaggregate for just the trace events in the particular node subsetassociated with a particular parent node. By doing so, performance ofthe particular node handling all IPC calls during a sample period can becompared to the performance of the particular node handling just the IPCcalls from the particular parent node during the sample period. Suchcomparison may be helpful in identifying if IPC calls from theparticular parent node are a significant cause of poor performance ofthe particular node.

In an embodiment, to resolve parent span identifiers to nodeidentifiers, all unique span identifiers in all span subsets in a givenrequest path subset are resolved to node identifiers. Then, the parentspan identifiers in the span subsets in the given request path areresolved to node identifiers based on the span identifier resolutions.For example, if span identifier ‘4’ in a request path identified bytrace identifier ‘abc656b2a23d42be’ is resolved to node identifier‘web_server_1’, then a parent span identifier of ‘4’ in a trace eventwith the same trace identifier of ‘abc656b2a23d42be’ can also beresolved to ‘web_server_1’.

By resolving parent span identifiers in trace events belonging to a nodesubset to node identifiers, trace events in the node subset can besub-divided by parent node identifier. A set of trace events identifiedin a set of collected trace events that pertain to a particular node anda particular parent node may be referred to herein as a “parent-nodesubset”. One or more parent-node subsets can be identified in a set ofcollected trace events based on the trace identifier values, the spanidentifier values, the parent span identifier values, and nodeidentifier values in the trace events in the set of collected traceevents. For each such parent-node subset, all trace events in theparent-node subset that have a node identifier property may have thesame value for the node identifier property. Further, the parent-nodesubset is associated with a “parent” node identifier which identifiesthe parent node of all trace events in the parent-node subset.

Within a span subset of trace events, one or more “parent-span” subsetscan be identified based on the value of the parent span identifierproperty (e.g., the parent_span_id property described above) in thetrace events in the span subset. In particular, all trace events in aspan subset that pertain to a particular parent span have the same valuefor the parent span identifier property. Thus, all trace events in aspan subset that pertain to a particular parent span can be identifiedby their common parent span identifier value. One or more parent-spansubsets can be identified in a span subset of trace events based on theparent span identifier value in the trace events in the span subset. Foreach such parent-span subset, all trace events in the parent-span subsethave the same values for the trace identifier, the span identifier, andthe parent span identifier properties.

3.4 Computing Span Metric Aggregates

At step 608, according to an embodiment, one or more span metricaggregates are computed for each node subset identified in a set oftrace events collected during a sample period.

A number of different span metric aggregates can be computed from a nodesubset.

In an embodiment, a span metric aggregate is computed for a node subsetas the count of the number of requests/calls received by the node duringthe sample period. The number of requests/calls received by the node maybe counted as the number of trace events in the node subset of type“request_recv” or of type “call_recv”.

In an embodiment, a span metric aggregate is computed for a node subsetas the count of the number of responses/replies sent by the node duringthe sample period. The number of requests/calls received by the node maybe counted as the number of trace events in the node subset of type“request_recv” or of type “call_recv”.

In an embodiment, a span metric aggregate is computed for a node subsetas the count of the number of responses/replies sent by the node duringthe sample period. The number of requests/calls received by the node maybe counted as the number of trace events in the node subset of type“request_recv” or of type “call_recv”.

In an embodiment, one or more span metric aggregates are computed for anode from the values of the “error_code” property in the node subsetidentified for the node. In an embodiment, one or more span metricaggregates are computed for a node from the values of the“server_side_total_time” property in the node subset identified for thenode. In both cases, the span metric aggregates may be computed fromtrace events in the node subset of certain trace event types. Forexample, the span metrics aggregates may be computed trace events in thenode subset of type “request_sent” or type “reply_sent”.

In an embodiment, a span metric aggregate is computed for a node subsetas the count of the number of errors in the node subset. An error may becounted if a trace event in the node subset of type “request_sent” ortype “reply_sent” has a value for the “error_code” property thatindicates that an error occurred. For example, for the HTTP protocol, avalue of “500” for the “error_code” property may indicate that an erroroccurred. An error may also be counted if a given trace event in thenode set of type “request_recv” or of type “call_recv” has nocorresponding trace event in the node subset of type “response_sent” or“reply_sent”. This indicates that an initiator request or an IPC callwas received by the node but that the node was unable to send to aresponse or reply. The corresponding trace event, if present in the nodesubset, would have the same value for the trace identifier property asthe given trace event. Thus, the absence of a corresponding trace eventin the node subset with a value for the trace identifier property equalto the value for the trace identifier property in the given trace eventmay indicate than an error occurred.

In an embodiment, a span metric aggregate is computed for a node subsetas the rate of errors in the node subset. The rate may be computed overthe number of spans in the node subset (e.g., number of errors pernumber of spans in the node subset) or over a period of time (i.e.,number of errors per period of time). The number of spans may be countedas the number of trace events in the node subset of type “request_sent”or type “reply_sent”. This count is also a count of the number of spansin the node subset for which a response or a reply is sent by the node.Alternatively, the number of spans may be counted as the number of traceevents in the node subset of type “request_recv” or type “call_recv”.This count is also a count of the number of spans in the node subset forwhich a request or a call is received by the node.

In an embodiment, a span metric aggregate is computed for a node subsetas the count of the number of errors in the node subset. An error may becounted if a trace event in the node subset of type “request_sent” ortype “reply_sent” has a value for the “error_code” property thatindicates that an error occurred. For example, for the HTTP protocol, avalue of “500” for the “error_code” property may indicate that an erroroccurred.

In an embodiment, a span metric aggregate is computed for a node subsetas the average span time. The average span time may be computed as anaverage of the values of the “server_side_total_time” property in traceevents in the node subset of type “request_sent” or type “reply_sent”.In some embodiments, only trace events of type “request_sent” or type“reply_sent” with a value for the “error_code” property indicating thatthe request or reply was successfully sent are considered in the averagecomputation. The computed average can be an arithmetic mean of the“server_side_total_time” values, a median of the values, a truncatedmean of the values, a weighted mean of the values, a moving average, orother mathematical average of the values.

Any of the above span metric aggregates computed for a node subset canbe similarly computed for a parent-node subset by considering only thetrace events the parent-node subset.

3.5 Displaying Span Metric Aggregates

Once one or more span metric aggregates have been computed based ontrace events collected from nodes in an online distributed computersystem during a sample period, they may be presented to a user in agraphical user interface. At steps 610 and 620, a graphicalrepresentation of at least one of the one or more system nodes for whicha span metric aggregate is computed is displayed in a graphical userinterface in conjunction with display of the span metric aggregate. Inan embodiment, the graphical user interface comprises a call graph thatvisually conveys the IPC call dependencies between systems nodes duringthe sample period.

FIG. 9 illustrates an example call graph 902 that may be generated andpresented in a graphical user interface 900 to a user after one or morespan metric aggregates have been computed. Instructions and data forgenerating graphical user interface 900 at a user's computing device maybe generated by a web server and served to the user's computing devicefor processing by a web browser application executing on the user'scomputing device. For example, the instructions may include HyperTextMarkup Language (HTML) data, eXtensible Markup Language (XML) data,digital image data, or other data or instructions suitable for renderinggraphical user interface 900 at the user's computing device.

As shown, the call graph 902 comprises a number of visual nodesrepresented as circles in the call graph 902. Each visual nodecorresponds to a node in the online system. A visual edge connecting twovisual nodes represent one or more IPC calls during the sample periodbetween the two nodes corresponding to the two visual nodes connected bythe visual edge. Each visual node in the call graph 902 also correspondsto a node subset. Each visual edge in the call graph also correspond toa parent-node subset.

In an embodiment, visual nodes in the call graph 902 are labeled withtheir respective node identifiers. For example, visual node 904A islabeled with the node identifier label 906A of “DMS”, visual node 904Bis labeled with the node identifier label 906B of “ABCLOUD”, and visualnode 904C is labeled with the node identifier label 906C of “TESTEPREF”.

In some embodiments, visual nodes are color coded to indicateperformance problems. For example, a visual node may be colored red ifthe number of errors for the corresponding node during the sample periodor the error rate for the corresponding node during the sample periodexceeds a threshold. As another example, a visual node may be coloredaccording to an average span time for the corresponding node during thesample period. For example, a visual node may be colored green toindicate that the average span time for the corresponding node duringthe sample period was below a first level threshold, colored yellow ifthe average span time was above the first level threshold but below asecond level threshold, or colored red if the average span time wasabove the second level threshold.

A visual edge connecting two visual nodes in the call graph 902represents an IPC call dependency during the sample period between thenodes corresponding to the two visual nodes connected by the visualedge. For example, visual edge 908A connecting visual nodes 904A and904B in call graph 902 represents one or more IPC calls from the DMSnode to the ABCLOUD node during the sample period.

In some embodiments, visual edges are color coded to indicateperformance problems. For example, a visual edge may be colored red ifthe number of errors for the corresponding node during the sample periodor the error rate for the corresponding parent-node subset during thesample period exceeds a threshold. As another example, a visual edge maybe colored according to an average span time for the correspondingparent-node subset during the sample period. For example, a visual edgemay be colored green to indicate that the average span time for thecorresponding parent-span subset during the sample period was below afirst level threshold, colored yellow if the average span time was abovethe first level threshold but below a second level threshold, or coloredred if the average span time was above the second level threshold.

In some embodiments, the call graph 902 is interactive. In particular, auser may direct input to the graphical user interface to displaycomputed span metric aggregates. In this way, the user can obtaindetailed performance information about selected nodes of interest.

In one embodiment, when a user directs user input to a visual edge ofthe call graph 902, one or more span metric aggregates computed for thecorresponding parent-node subset are displayed. For example, graphicaluser interface dialog 910A may be displayed in response to user input(e.g., touch gesture, mouse over, click, etc.) directed to visual edge908A. The dialog 910A includes a trend chart 911A and a bar chart 912A.Trend chart 911A graphs a trend in the number of IPC calls from node“DMS” to node “ABCLOUD” over the sample period. Bar chart 912A chartsthe number of IPC replies from node “ABCLOUD” to node “DMS” over thesample period by status code (e.g., HTTP status code).

In one embodiment, when a user directs user input to a visual node ofthe call graph 902, one or more span metric aggregates computed for thecorresponding node subset are displayed. For example, graphical userinterface dialog 910B may be displayed in response to user input (e.g.,touch gesture, mouse over, click, etc.) directed to visual edge 906C.The dialog 910B includes a trend chart 911B. Trend chart 911B graphs atrend in the number of IPC calls from all parent nodes to node“TESTEPREF” during the sample period.

4.0 Targeted Tracing

As discussed above, initiator requests can be traced on a uniformlyrandom basis. For example, every other N initiator requests can beselected for tracing. While uniformly random tracing can be useful forpurposes of distributed trace aggregation, there may be somecircumstances where targeted tracing is desired. For example, a user mayreport an error when making a certain request of an online service. Inthis case, the online service provider may wish to trace a selectedsubset of all subsequent user requests of the online service in order todiagnose the root cause of the error. For example, the online serviceprovider may wish to trace all subsequent user requests from the userthat reported the error. Such targeted tracing is not possible with adistributed tracing technology that traces initiator requests only in auniformly random way.

With targeted distributed tracing, trace events for specificallytargeted initiator requests are generated. In an embodiment, thetargeted initiator request are HTTP requests and the HTTP requests canbe targeted based on fields in the HTTP request header and name-valuepairs in the query string portion of the HTTP request Uniform ResourceLocator (URL). To do so, edge nodes of the online distributed computersystem are configured with one or more targeted trace queries. When aHTTP request from an initiator arrives, the edge node evaluates the HTTPrequest against the one or more queries. If any one of the queries issatisfied by the HTTP request, then distributed tracing is enabled forthe request. The edge node can enable distributed tracing for therequest by setting an appropriate field or value in the trace contextfor the request that is propagated within nodes of the onlinedistributed computer system in thread-local storage and between nodes ofthe online distributed computer system in interprocess communicationcalls.

So that trace event generated for a targeted initiator HTTP request canbe associated with the query that caused the request to be targeted, theedge node that enables tracing for the request generates a special“targeted” trace event that includes the probabilistically unique traceidentifier assigned to the request and the target trace query (or anidentifier thereof) the request satisfied. By doing so, when all traceevents generated with that trace identifier including the targeted traceevent are collected, it can be determined by the presence of thetargeted trace event in the collection: a) that the initiator HTTPrequest assigned that trace identifier was a targeted request, and b)the target trace query the targeted request satisfied.

FIG. 10 is a flow diagram of distributed tracing technology for targeteddistributed tracing. Initially, a user 1002 provides a trace query totargeted tracing configuration server 1006 through the user's computer1004. The user 1002 can provide the trace query through a user interfacepresented at the user's computer 1004 such as, for example, a commandline interface or a graphical user interface (e.g., a web page sent fromconfiguration server 1006).

In general, the trace query is a set of name-expression pairs. Each nameand expression can be formatted as a character string data type. Eachname of a name-expression pair may correspond to a HTTP request fieldname and the expression of the name-expression pair corresponds to thevalue of that HTTP request field name. The expression can be a literalcharacter string value or a regular expression for pattern matchingagainst the value of the corresponding HTTP request field name. The HTTPrequest field can be a field (i.e., name-value pair) in the query stringportion of the HTTP request URL or a header field in the HTTP requestheader portion of the HTTP request. Thus, for purposes of evaluating atrace query against an incoming HTTP request, there may be nodistinction made between HTTP request fields from the query stringportion of the HTTP request URL and fields from the HTTP request header.

In some instances, a trace query is a set of name-expression pairsarranged in a Boolean expression in which name-expression pairs arerelated to one another by one or more Boolean operators such as AND, OR,NOT and precedence operators [e.g., open/closed parentheses ( )] to forman overall Boolean expression that as a whole either evaluates to TRUEor FALSE when applies to a given HTTP request.

Once the user 1002 has provided the trace query through the userinterface at the user's computer 1004, it is sent to the configurationserver 1006 which sends the trace query some or all of the edge nodes1008 in the online distributed computer system. Each edge node 1008 maymaintain a list of one or more different trace queries where each of thedifferent trace queries target different initiator HTTP requests.

When an edge node 1008 configured with a trace query 1008 receives aHTTP request from an initiator, the edge node 1008 evaluates each of thetrace queries in its list against the incoming HTTP request. Thisevaluation may be performed as part of HTTP request handling. Inparticular, the evaluation may be performed after the HTTP request URLand request headers have been parsed and stored in an appropriate datastructure such as an associative array, dictionary, or other mappingbetween HTTP request field names and their values.

For evaluation purposes, for a given name-expression pair in a tracequery, then the name-expression pair is satisfied by the incoming HTTPrequest if the HTTP request has an HTTP field with the specified nameand its value matches the specified expression. For purposes of matchingthe HTTP request field value to the specified expression, such matchingmay be case insensitive. If the specified expression is a regularexpression, then the regular expression is evaluated against the HTTPrequest field value to determine if there is a match. Otherwise, theHTTP request field value matches the specified expression if there is anexact case-insensitive match.

For a trace query with multiple name-expression pairs, the trace queryis satisfied if a Boolean expression relating the multiplename-expression pairs together evaluates to TRUE. This involvesevaluating at least one name-expression pairs depending on theparticular Boolean expression at hand. If the trace query is simply aset of two or more name-expression pairs, then the multiplename-expression pairs can be related together in the disjunctive or theconjunctive, according to a default.

If the edge node 1008 that receives an incoming initiator HTTP requestdetermines that the HTTP request satisfies a trace query, then the edgenode 1008 enables tracing for that HTTP request. Such enabling involvessetting a particular field value in the trace context for the HTTPrequest so that the other system nodes 1010 of the online distributedcomputer system generate trace events when handling interprocesscommunication calls in the request path of the HTTP request. Asmentioned above, the trace context can be generated by the edge node andstored in thread-local storage for propagation to other instrumentationpoints at the edge node. Further, the trace context can be propagated toother nodes 1010 in interprocess communication calls. For example, thetrace context stored in thread local storage and sent in interprocesscommunications calls can have a field or value indicating whethertracing is enabled for the current HTTP request. This field or value canbe checked at the various execution points in the standard libraries todetermine whether or not a trace event should be generated at theinstrumentation point.

In addition to setting the appropriate field value in the trace contextwhen a targeted HTTP request is identified, the edge node generates aspecial trace event. The special trace event allows processes andapplications (e.g., search engine applications) that consume traceevents from the data pipeline to determine whether a given distributedtrace was targeted (or just uniformly randomly selected) and, iftargeted, the trace query that targeted it. To do this, the speciallygenerated trace event includes the trace identifier assigned to the HTTPrequest by the edge node and some indication that the HTTP request wastargeted. This indication could be the trace query itself and/or theHTTP request fields that satisfied the trace query. The indication couldalso include a special trace event type value indicating that the traceis for a targeted initiator request.

Another benefit of generating this special trace event is that tracequery and/or HTTP request field information does not need to bepropagated in the trace context or stored in other trace eventsgenerated for the targeted HTTP request. Instead, processes andapplications that consume trace events can determine whether a set oftrace events all with the same trace identifier value were generated fora targeted trace or not by examining the set for the presence or absenceof the special trace event. If special trace event is present in theset, then the trace was a targeted trace and information about the tracequery and/or matching HTTP request fields can be obtained from thespecial trace event. If the special trace event is not present in theset, then the trace is not a targeted trace (assuming a special traceevent was never generated).

5.0 Extensions And Alternatives

The present disclosure encompasses all changes, substitutions,variations, alterations, and modifications to the example embodimentsherein that a person having ordinary skill in the art would comprehend.Similarly, where appropriate, the appended claims encompass all changes,substitutions, variations, alterations, and modifications to the exampleembodiments herein that a person having ordinary skill in the art wouldcomprehend.

1-20. (canceled)
 21. A method, comprising: generating a plurality oftrace events at a plurality of system nodes of an online distributedcomputer system for a plurality of request paths, each trace event inthe set of trace events is generated for a corresponding request path inthe plurality of request paths and for a corresponding span of thecorresponding request path, the corresponding span representingcomputation performed by the system node at which the trace event isgenerated; identifying a subset of the plurality of trace eventspertaining to a particular system node; computing a span metricaggregate from one or more span metrics in the subset of the pluralityof trace events; displaying, in a graphical user interface, a graphicalrepresentation of the particular system node; and displaying, in thegraphical user interface, the span metric aggregate in conjunction withthe display of the graphical representation of the particular systemnode.
 22. The method of claim 21, further comprising collecting theplurality of trace events from the plurality of system nodes.
 23. Themethod of claim 21, wherein the computation performed by the system nodeoccurs on behalf of an interprocess communication call from a parentspan in the corresponding request path, wherein the parent spancorresponding to one of the system nodes in the corresponding requestpath.
 24. The method of claim 21, wherein identifying the subset of theplurality of trace events pertaining to the particular system node isbased on one or more trace identifiers, one or more span identifiers,and one or more node identifiers in the subset of the plurality of traceevents.
 25. The method of claim 21, wherein the subset of the pluralityof trace events additionally pertains to a particular parent system nodeof the particular system node.
 26. The method of claim 25, whereinidentifying the subset of the plurality of trace events pertaining tothe particular parent system node and the particular system node isbased on one or more trace identifiers, one or more span identifiers,one or more parent span identifiers, and one or more node identifiers inthe subset of the plurality of trace events.
 27. The method of claim 21,wherein the graphical user interface comprises a visual call graphhaving a set of visual nodes and one or more visual edges connecting theset of visual nodes=.
 28. The method of claim 21, wherein the spanmetric aggregate is displayed in response to detecting input directed tothe graphical representation.
 29. The method of claim 21, wherein thespan metric aggregate is displayed in response to detecting inputdirected to a graphical representation of a graph edge visuallyconnecting the graphical representation of the particular system nodewith a graphical representation of another node of the plurality ofsystem nodes.
 30. One or more non-transitory computer-readable mediastoring instructions that, when executed by one or more processors,cause the one or more processors to perform the steps of: generating aplurality of trace events at a plurality of system nodes of an onlinedistributed computer system for a plurality of request paths, each traceevent in the set of trace events is generated for a correspondingrequest path in the plurality of request paths and for a correspondingspan of the corresponding request path, the corresponding spanrepresenting computation performed by the system node at which the traceevent is generated; identifying a subset of the plurality of traceevents pertaining to a particular system node; computing a span metricaggregate from one or more span metrics in the subset of the pluralityof trace events; displaying, in a graphical user interface, a graphicalrepresentation of the particular system node; and displaying, in thegraphical user interface, the span metric aggregate in conjunction withthe display of the graphical representation of the particular systemnode.
 31. The one or more non-transitory computer readable media ofclaim 30, further comprising collecting the plurality of trace eventsfrom the plurality of system nodes.
 32. The one or more non-transitorycomputer readable media of claim 31, wherein the computation performedby the system node occurs on behalf of an interprocess communicationcall from a parent span in the corresponding request path, wherein theparent span corresponding to one of the system nodes in thecorresponding request path.
 33. The one or more non-transitory computerreadable media of claim 31, wherein identifying the subset of theplurality of trace events pertaining to the particular system node isbased on one or more trace identifiers, one or more span identifiers,and one or more node identifiers in the subset of the plurality of traceevents.
 34. The one or more non-transitory computer readable media ofclaim 31, wherein the subset of the plurality of trace eventsadditionally pertains to a particular parent system node of theparticular system node.
 35. The one or more non-transitory computerreadable media of claim 34, wherein identifying the subset of theplurality of trace events pertaining to the particular parent systemnode and the particular system node is based on one or more traceidentifiers, one or more span identifiers, one or more parent spanidentifiers, and one or more node identifiers in the subset of theplurality of trace events.
 36. The one or more non-transitory computerreadable media of claim 31, wherein the graphical user interfacecomprises a visual call graph having a set of visual nodes and one ormore visual edges connecting the set of visual nodes=.
 37. The one ormore non-transitory computer readable media of claim 31, wherein thespan metric aggregate is displayed in response to detecting inputdirected to the graphical representation.
 38. The one or morenon-transitory computer readable media of claim 31, wherein the spanmetric aggregate is displayed in response to detecting input directed toa graphical representation of a graph edge visually connecting thegraphical representation of the particular system node with a graphicalrepresentation of another node of the plurality of system nodes.
 39. Acomputer system, comprising: a memory; and a processor that: generates aplurality of trace events at a plurality of system nodes of an onlinedistributed computer system for a plurality of request paths, each traceevent in the set of trace events is generated for a correspondingrequest path in the plurality of request paths and for a correspondingspan of the corresponding request path, the corresponding spanrepresenting computation performed by the system node at which the traceevent is generated; identifies a subset of the plurality of trace eventspertaining to a particular system node; computes a span metric aggregatefrom one or more span metrics in the subset of the plurality of traceevents; displays, in a graphical user interface, a graphicalrepresentation of the particular system node, and displays, in thegraphical user interface, the span metric aggregate in conjunction withthe display of the graphical representation of the particular systemnode.
 40. The computer system of claim 39, wherein the processoridentifies the subset of the plurality of trace events pertaining to theparticular system node based on one or more trace identifiers, one ormore span identifiers, and one or more node identifiers in the subset ofthe plurality of trace events.